Skip to content

Implement non-utf8 string to bytes in OTLP exporters.#3991

Open
owent wants to merge 4 commits intoopen-telemetry:mainfrom
owent:validate_utf8_string_in_otlp
Open

Implement non-utf8 string to bytes in OTLP exporters.#3991
owent wants to merge 4 commits intoopen-telemetry:mainfrom
owent:validate_utf8_string_in_otlp

Conversation

@owent
Copy link
Copy Markdown
Member

@owent owent commented Apr 12, 2026

Fixes ##3491

Changes

  • Mapping non UTF8 string to bytes in OLTP exporters.

https://github.com/open-telemetry/opentelemetry-specification/blame/v1.55.0/specification/common/attribute-type-mapping.md.
Reimplement #3512 to meet the latest spec.

For significant contributions please make sure you have completed the following items:

  • CHANGELOG.md updated for non-trivial changes
  • Unit tests have been added
  • Changes in public API reviewed

Copilot AI review requested due to automatic review settings April 12, 2026 15:55
@owent owent requested a review from a team as a code owner April 12, 2026 15:55
@owent owent force-pushed the validate_utf8_string_in_otlp branch from 2299799 to f66b834 Compare April 12, 2026 15:58
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the OTLP exporter attribute/value population logic to validate UTF-8 for string-like inputs and encode non-UTF8 values as bytes_value, aligning exporter output with the latest spec’s non-UTF8 mapping guidance. It also introduces a protobuf-sourced UTF-8 validation dependency to support this conversion in both CMake and Bazel builds.

Changes:

  • Add UTF-8 validation in OtlpPopulateAttributeUtils and convert invalid string inputs to bytes_value.
  • Remove the allow_bytes parameter from PopulateAnyValue/PopulateAttribute and update all call sites accordingly.
  • Add a new UTF-8 validity dependency wiring for OTLP (CMake + Bazel) and update OTLP log recordable tests.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
exporters/otlp/src/otlp_populate_attribute_utils.cc Adds UTF-8 validation and maps invalid strings to bytes_value; updates array-string handling similarly.
exporters/otlp/include/opentelemetry/exporters/otlp/otlp_populate_attribute_utils.h Updates public helper signatures to remove allow_bytes.
exporters/otlp/src/otlp_log_recordable.cc Updates log body/attribute population calls to new helper signatures.
exporters/otlp/src/otlp_recordable.cc Updates span attribute/event/link attribute population calls to new helper signatures.
exporters/otlp/src/otlp_metric_utils.cc Updates metric point attribute population calls to new helper signatures.
exporters/otlp/test/otlp_log_recordable_test.cc Updates tests to assert non-UTF8 string_view becomes bytes_value.
exporters/otlp/CMakeLists.txt Adds protobuf-sourced utf8_range/validity library integration and links OTLP recordable against it.
exporters/otlp/BUILD Adds a Bazel helper library to bring in protobuf’s utf8_range/utf8_validity deps and wires it into otlp_recordable.
Comments suppressed due to low confidence (1)

exporters/otlp/CMakeLists.txt:149

  • opentelemetry_otlp_recordable now links against ${OPENTELEMETRY_OTLP_DEP_UTF8_VALIDITY}, but the generated otlp_recordable pkg-config module doesn’t list a corresponding Requires: entry. This can break pkg-config consumers at link time (missing -lopentelemetry_otlp_utf8_validity when the bundled target is used). Consider conditionally adding opentelemetry_otlp_utf8_validity to the opentelemetry_add_pkgconfig(otlp_recordable, ...) requires list when that target is built.
  opentelemetry_add_pkgconfig(
    otlp_recordable
    "OpenTelemetry OTLP - Recordable"
    "OTLP recordable implementation for spans, metrics, and logs."
    "opentelemetry_trace opentelemetry_metrics opentelemetry_logs opentelemetry_resources"
  )

Comment thread exporters/otlp/CMakeLists.txt Outdated
Comment thread exporters/otlp/CMakeLists.txt Outdated
Comment thread exporters/otlp/src/otlp_populate_attribute_utils.cc Outdated
Comment thread exporters/otlp/src/otlp_log_recordable.cc Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 12, 2026

Codecov Report

❌ Patch coverage is 80.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.06%. Comparing base (dbf6567) to head (bc87985).

Files with missing lines Patch % Lines
...xporters/otlp/src/otlp_populate_attribute_utils.cc 80.00% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3991      +/-   ##
==========================================
+ Coverage   82.06%   82.06%   +0.01%     
==========================================
  Files         385      385              
  Lines       15891    15900       +9     
==========================================
+ Hits        13039    13047       +8     
- Misses       2852     2853       +1     
Files with missing lines Coverage Δ
...xporters/otlp/src/otlp_populate_attribute_utils.cc 92.91% <80.00%> (-1.78%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@dbarker dbarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feature @owent! Please see my comments below.

Comment thread exporters/otlp/src/otlp_populate_attribute_utils.cc Outdated
Comment thread exporters/otlp/src/otlp_populate_attribute_utils.cc Outdated
Comment thread exporters/otlp/src/otlp_populate_attribute_utils.cc Outdated
Comment thread exporters/otlp/CMakeLists.txt Outdated
Comment thread exporters/otlp/src/otlp_populate_attribute_utils.cc Outdated
Comment thread exporters/otlp/src/otlp_populate_attribute_utils.cc Outdated
Comment thread exporters/otlp/src/otlp_populate_attribute_utils.cc Outdated
@owent owent changed the title Implement non-utf8 string to bytes in OTLP exporters. [WIP] Implement non-utf8 string to bytes in OTLP exporters. Apr 18, 2026
@owent owent force-pushed the validate_utf8_string_in_otlp branch 2 times, most recently from 4ba4995 to 5de988c Compare April 18, 2026 13:52
@owent owent changed the title [WIP] Implement non-utf8 string to bytes in OTLP exporters. Implement non-utf8 string to bytes in OTLP exporters. Apr 19, 2026
Copy link
Copy Markdown
Member

@dbarker dbarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for the feature. Please see a few minor cleanup questions below.

Comment thread cmake/protobuf.cmake
Comment thread exporters/otlp/CMakeLists.txt Outdated
Copy link
Copy Markdown
Member

@marcalff marcalff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

This PR actually implements the specification, which should be a good thing in general.

In this particular case, and this is an opinion, the spec is a bad idea.

Please consider to add an option to opt-out and not add this extra UTF-8 validation, for applications that are instrumented correctly (strings are string, bytes sequences are binary), which do not need the overhead.

Comment thread exporters/otlp/src/otlp_populate_attribute_utils.cc
}
else
{
proto_value->set_bytes_value(str_value, strlen(str_value));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, strlen means the instrumented application can send binary in a string, as long as the binary does not contains a nul character ...

A properly instrumented application will need to provide a sequence of bytes instead of a string anyway, to work properly.

But this change (which faithfully implement the specs) makes every application slower, due to the added UTF-8 validation, to try to accommodate poorly instrumented applications using the wrong types.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We receive a const char* that we assume is null-terminated, but it may contain invalid UTF-8.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should include <cstring> for strlen then?

Comment thread exporters/otlp/CMakeLists.txt Outdated
Copy link
Copy Markdown
Contributor

@ThomsonTan ThomsonTan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider add a CHANGELOG entry for this feature work.

@owent owent force-pushed the validate_utf8_string_in_otlp branch from abd6f54 to 2999fdb Compare April 26, 2026 08:57
@owent owent requested review from ThomsonTan and marcalff April 26, 2026 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants